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Abstract. We define a numerical metliod tiiat provides a non-parametric estimation of the kernel shape 
in symmetric multivariate Hawkes processes. This method relies on second order statistical properties 
of Hawkes processes that relate the covariance matrix of the process to the kernel matrix. The square 
root of the correlation function is computed using a minimal phase recovering method. We illustrate our 
method on some examples and provide an empirical study of the estimation errors. Within this framework, 
we analyze high frequency financial price data modeled as ID or 2D Hawkes processes. We find slowly 
decaying (power-law) kernel shapes suggesting a long memory nature of self-excitation phenomena at the 
microstructure level of price dynamics. 
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1 Introduction 

Although the concept of self excitement was commonly 
used for a long time by seismologists (see [20] and the ref- 
erences therein), Alan Hawkes was the first to provide a 
well defined point process with a self exciting behavior [131 
[HlfT^ . This model was introduced to reproduce the rip- 
ple effects generated after the occurrence of an earthquake 
[SniT]. Since then, it has been successfully used in many 
areas ranging from seismology (see e.g., [53], for a recent 
review), biology [28) . to even criminology and terrorism 
[SniTO] (cf [19] and references therein for a detailed review 
of Hawkes processes and their applications). As far as fi- 
nancial applications are concerned, since transactions and 
price changes are discrete events, Hawkes processes have 
naturally been generating more and more interest. Appli- 
cations can be found in the field of order arrival rate mod- 
eling [TS1|71|53] , noise microstructure dynamics [5] , volatil- 
ity clustering [TT] , extreme values and VaR estimation [5] 
and credit modeling [12] . 

Hawkes models account for a self exciting behavior of 
events by which the arrival of one event increases the prob- 
ability of occurrence of new ones. In its most basic form 
and in the one dimensional case, the Hawkes process is a 
counting process defined by At , the rate of arrival of events 
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by: 

At = A* + / (t't-sdNs (1) 

where /x > is a constant background rate, Nt is the 
cumulative counting process and 4> a positive real function 
called decay kernel. We can clearly see in equation ([1]) that 
when some event occurs at time we have dNt = 1 and 
hence dXt = 0o- The influence of the event is transmitted 
to future times through (j) such that at time u > t the 
increase in At due to the time t event is 4>u-t- Thus a self 
exciting behavior is observed. 

A basic issue in applications of Hawkes processes con- 
cerns their estimation. In early applications, parametric 
estimation used spectral analysis by means of the Bartlett 
spectrum (the Fourier transform of the autocovariance of 
the process) [31[S] and it was indeed through that light 
that Hawkes presented his model. A maximum likelihood 
method for estimating the parameters of exponential, power 
law and Laguerre class of decay kernels was developed in 
(26l[24| and it became the standard method for estimating 
Hawkes processes. Furthermore, the Laguerre decay ker- 
nels were seen to be very pertinent decay kernels because 
they allowed to account for long term dependencies as well 
as offering short term flexibility. More recently, other types 
of estimation procedures were developed. When the form 
of the decay kernel is unknown, non-parametric methods 
are desirable because they give an idea of their general 
shape. For example, by using the branching property of 
the Hawkes process [32], the authors in [50] and [TB] were 
able to provide Expectation-Maximization algorithms to 
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estimate both background rate and the decay kernel. In 
[28] . the authors present also an algorithmic method for 
decay estimation by using a penalized projection method. 

In this paper, we propose an alternative simple non 
parametric estimation method for multivariate symmet- 
ric Hawkes processes based on the Bartlett spectrum. We 
present the method and its numerical feasibility without 
going too much in details about convergence speeds or 
error optimization. By studying one dimensional and 2- 
dimensional examples, we show that our approach pro- 
vides reliable estimates on both fast and slowly decaying 
kernels. We then apply our method to high frequency fi- 
nancial data and find power-law kernels. This implies that 
the arrival of events display long range correlations, a phe- 
nomenon well known in finance, similarly to what was sug- 
gested in 

This paper is organized as follows. In section [51 we in- 
troduce a basic version of a multivariate Hawkes processes. 
We place ourselves in the context of an n— dimentional lin- 
ear Hawkes process with a constant background rate and a 
nonnegative decay kernel. We set some notations and give 
out a martingale representation of the rate function Xt in 
Eq. ([T]). In section [21 we study the autocovariance of the 
process. Along the same line as [M], we establish its rela- 
tionship with the decay kernel in both direct and Fourier 
spaces. The estimation method in the case of symmet- 
ric Hawkes processes is then provided in section [H This 
method, based on a Hilbert transform phase recuperation 
method, is explicitly detailed in both univariate and spe- 
cial symmetric bivariate cases. In section [SI the method 
is illustrated by numerical examples for both exponential 
and power law kernels. We also address some statistical is- 
sues concerning the estimation errors. Finally, in section[ni 
we apply our method to high frequency financial data for 
which we deduce a long range nature of the decay kernels. 

2 Multivariate Hawkes Processes 
2.1 Notations and Definitions 

As introduced by Hawkes in [T3] and [13], let us consider 
an n-dimensional point process {Nt)t>o, where Nl 1 < i < 
n represents the cumulative number of events in the i*'' 
component of the process Nt up to time t. 

The conditional intensity vector at any time t is as- 
sumed to be a random process that depends on the past 
as ^ 

At = + / (t>t-sdNs (2) 

where ^ is a vector of size n with strictly positive compo- 
nents (/i* > 0) and is an n X n matrix referred to as the 
decay kernel. 

Notations 1 In the following 

^ M.n,pi^) (resp. A4n^p{G)) denotes the set ofnxp ma- 
trices with values in IR (resp. C). For any matrix M 
(resp. vector v), M^^ (resp. v^) denotes its elements. 



— For any function ft, fz — /j; e ftdt corresponds to 
its Laplace transform. 

— By extension if Mt G Mn,p{R) then G A^ri,p(C) 
corresponds to the matrix whose elements are the Laplace 
transforms of the elements of Mt . 

Using these notations, one of the main results of Hawkes 
is that if 

HI the kernel (/>i G M.n,n is positive {(fif' > 0, Vt) and 

causal = 0, Vt < 0), 
and ^ 

H2 the spectral radius of 0o (i.e., its largest eigen value) 
is strictly smaller than 1, 

then {Nt)t>o is a n-dimensional point process with sta- 
tionary increments. The conditional intensity At is itself a 
stationary process with mean 

A = E{Xt) = E{dNt)/dt. (3) 

Combining this last equation with Eq. ([5]), one easily gets 
A = ^-'r (j)t-sdsA = /i -|- (f>uduA and consequently 

A = {1- 0o)" V, (4) 

where I refers to the n x n identity matrix. 

Before moving on, we need to introduce some more 
notations that will be used all along the paper. 

Notations 2 If At e Mm,n(R) and Bt £ Mn,p(fi) then 
the convolution product of At and Bt is naturally defined 
as A -k Bt = /jfj AsBt-sds = At-sB^ds. Of course it 
is associative and distributive however it is generally not 
commutative (unless At and Bt are commutative). The 
neutral element is Sit, i-e-, the diagonal matrix with Dirac 
distribution on the diagonal. In the following we will use 
the notation : 

A-kdNt^ / At^sdN,. 

Combining both Notations [1] and [H it is easy to show that 
the convolution theorem on matrices translates in 

aTb, = A,B,. 



2.2 Martingale representation of A* 

We roughly follow a similar path to Hawkes in [T3]. Let 
{Mt)t>o be the martingale compensated process of {Nt)t>o 
defined by: 

dMt = dNt -Xtdt . (5) 

Then At can be represented as a stochastic integral with 
respect to the martingale {Mt)t>a' 

Proposition 1 One has: 

Xt^ A + ^i.dMt, (6) 
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where tf'j is defined as 

oo 
n=l 

where ^j*"'' refers to the n*"^ auto-convolution of (j)t (i.e., 

Proof Using equations ^ and ([5]) one has: 

At = ^ + ★ dNt = n + 4>-k dMt + Xt, 

and consequently 

((51 - 0) 7k- At = /I + ★ dMt. 

Let us note that the inverse of 6It — (f't for the convolution 
product is nothing but ht — Sit + ^'t- Thus convoluting 
on each side of the last equation by /it, one gets (since 
h*(l)t = ^t) 

Xt ^ hi^ fi + Wi^dMt. 

Since is a constant one has h fi = ^iHq. Using the 
convolution theorem one gets fiho = (I — (/)o)^^/i = A 
which proves the proposition. 

3 The covariance matrix of Hawkes processes 

The kernel estimator we are going to build is based on 
the empirical auto-covariance of {Nt)t>o. This section is 
devoted to the covariance matrix of the n-dimensional 
Hawkes processes. However, we first discuss some useful 
results about their infinitesimal auto-covariance function. 

3.1 The infinitesimal covariance 

Let us define the infinitesimal covariance matrix: 
i^t-t' ^ E{dNtdNl), 

where denotes the hermitian conjugate of a matrix 
M. Along the same way as Hawkes in [T3], vt-t' can be 
related to the decay kernel cf). We present in proposition [3] 
an equation linking the infinitesimal covariance matrix to 
A and W but, unlike Hawkes, we express v explicitly as a 
function of if' and A instead of an integral equation in v. 
This result will be at the heart of the estimation method 
we propose in this paper. 

Proposition 2 Let {Nt)t>o be an n-dimensional Hawkes 
process with intensity At as defined in Section \2.1\ ( as- 
suming both HI and H2). Let Wt = ^-t. We have the 
following result: 

E{dNtdNl) = (aA^ + S5t-t' + ^t-t'S + 

+ ^i.i:^l_A dtdt' 

where S is the diagonal matrix defined by S" = A^ for all 
1 < i <n and St is the Dirac distribution. 



Proof By using equation ([5]), we can write: 

E{dNtdNl) ^ E{{dMt + Xtdt){dMt' + Xt'dt'^) 



= E{dMtdMl) (9) 

+E{XtdMl)dt (10) 

+E{dMtX\,)dt' (11) 

+E{XtX\,)dtdt' (12) 



We begin by noticing that (thanks to martingale prop- 
erty), E{dMtdMl) ==0, yt ^ t'. As for when t = t' , we 
have for all 1 < i < j < n 

E{dMidMl) = 

since the Af"s (and hence the M*'s) for \ <i <n have no 
jump in common. Moreover, for i — j we have 

E{dMldMl) = A'dt 

because E{dMidMi) = E{dNldNi) and E{dNidNi) = 
E{dNl) = A^dt since the jumps of A^t are of size 1. To 
sum up, the term Q becomes: 

E{dMtdMl) = SSt-t'dtdt' (13) 

The remaining terms can be then calculated along the 
same line. Replacing At's expression from equation ^ in 
the term pU]) . gives: 

E{XtdMl)dt ^ E {Wi^dMtdMl^j dt, 

or in other words 

E{XtdMl)dt= / ^t-sE{dMsdMl)dtds 

and thanks to equation (|13p . 

E{XtdMl)dt^ / ^t-sSSs-t'dsdt'dt 

that is simplified to: 

E{XtdMl)dt = ^t-t'Sdtdt' (14) 
Similarly, the term (ITTI) becomes: 

E{dMtX\,)dt' = S^l_^dtdt' (15) 

and finally, using Eq. ([T^. the term (fT^ can be written 
as: 

E{XtX\,)dtdt' = [ae{X\,) + E{{1r dMt)X\^ dtdt' 
= (^AA"^ + j >Ft-sE(dMsXlds)^ dtdt' 

= (^AA'' + J Wt-sS'^l^.ds)^ dtdt' 

By setting m = i' — s, we get: 

E{XtX\,)dtdt' = (ylyl^ dtdt', (16) 

which ends the proof of the proposition. 
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3.2 The covariance matrix 

The (normalized) covariance matrix of the Hawkes process 
can be defined, at scale h and lag t, by 



h-^C0v{Nt+h - NuNt+h+r - Nt+r) , (17) 



where we normalized by h in order to avoid a trivial scale 
dependence. Let us note that, since the increments of Nt 
are stationary, the previous definition does not depend on 
t. Thus, it can be rewritten as 

4'') = (^{J^ dN, - ^^)(^"^'' dNj - A^h)^ (f8) 

It is clear that this quantity can be easily estimated on 
real data using empirical means. As we will see, the non 
parametric estimation of the decay kernel we propose in 
this paper is based on these empirical estimations. More 
precisely, it is based on the following Theorem. 

Theorem 1 Let g\^'^ = (1 — x)^. v^^^ can he expressed 
as a function of gi^^ and ■ 



Proof Let us begin the proof by noticing that for any func- 
tion / with values in R"'' we have: 



h-^ J ft-t'dt'dt = f^g^'^l 



(20) 



It follows that 
1 



V. 



[h) 



T+h 

^E^j dN. I dNl. 



h pT+h ^ 

dNsA^h-Ah / dNl+AA^h^ 




dNfdNl - AA^h^ 



Using equation we can split dNtdN^, into four parts 
and write: 



1 



h fT+h 



Jt 



By applying equation (j20p to each of the terms under the 
double integral we get equation (fT9)) and achieve the proof. 



It is more convenient to rewrite the result of theorem 
13.21 in Laplace domain. Since (^f*")^ — 0", Eq. ([7]) trans- 
lates into : 



+ 00 



1^. 



(21) 



and conversely: 

d, = {l + $,)-^^z (22) 
Theorem 13.21 gives way to the following corollary. 
Corollary 1 In Laplace domain equation il9\) becomes: 

vf) = gW{l + + (23) 

4 Non-parametric estimation of the kernel 0f 
4.1 The estimation principle 

In this paper, we aim at building an estimator of based 
on empirical measurements of ui'*' . Let us note that empir- 
ical measurements of wi'''* are naturally obtained replacing 
probabilistic mean by empirical mean in Eq. (jl8p (see [3] 
for proof of convergence of the empirical mean towards 
the probabilistic mean). Thus, in order to build an es- 
timator, we need to express (jjt as a function of Vt^\ In 
the Laplace domain, since a given ipz corresponds to a 
unique <j)z (through Eq. (I^ l. it translates in trying to 

express ipz as a function of vi^^ which exactly corresponds 
to inverting Eq. (i.e., computing the square root of 

(I -I- 'I'*)S{I + !f'*)'^). Indeed, knowing V'z, one easily gets 
02 (from Eq. (P^ ) and finally (pf Actually, let us note 
that, from a practical point of view, we don't need to work 
in the full complex domain ^ G C of the Laplace trans- 
form. Working with the Fourier transform restriction (i.e., 
z = iuj with cj G R) is enough to recover 0t. 



Dealing with cancelations of gi''^ . The first problem that 
seems to appear for inverting this formula (1231) (i.e., ex- 
pressing ij^z as a function of Si''^ only for z = iuj) is the 
scalar term 'g^z'^ that may vanish. Indeed, since gf''' = (1 — 
J|;^)+, its Fourier transform, g'^^ = (4/^^/1) sin^(w/i/2), 
cancels for all u of the form n G Z, n 7^ 0. Actually, 
this is not a real problem as long as 

— T in the empirical estimation of wi'''' is sampled using 
a sampling period A greater than h/2 

— Z\ is small enough so that v'^l^ can be considered to 
have a compact support 



Indeed, if this is the case, then the estimation of u-^^ (prac- 
tically obtained by taking the Discrete Fourier Transform 
(DFT) of the sampled signal (w^^)fc) will be equal on 

[-7r/Z\,7r/Z\] to the product of the DFT of gi''^ (which 
does not vanish since [— 7r/Z\, 7r/Z\] C [— 7r//i, 7r//i]) and 

the DFT of [5r + ^)i:*{Sr + ^)t- Consequently, as long 
as h is small enough (so that A can be chosen greater than 
h and small enough), dividing on both hand sides Eq. 

by gi'^^ is not, from a practical estimation point of view, 
a real problem. So, in the following we will write 

^di'^^/gi'^l (24) 



n=l 



without bothering with eventual cancelations of g. 
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Computing the square root of (I + + ■ In 

the estimation process, once we have estimated Wr''' and 
consequently (through DFT) wi''-' (for z = iuj), using Eq 
dm), we can estimate (I + + . We need to 

go from there to the estimation of 'Fz (then using Eq. 
(|22p . we get (j)z and then by inverse Fourier transform 
This problem requires therefore to take the square root 
of the left hand side of Eq. (|24p . In dimension n — 1, 
it means being able to go from |1 + if'^p to 'F^. There is 
clearly a phase determination problem. We will see that, 
in dimension n = 1, this phase is uniquely determined 
by the hypothesis HI and H2. However, in dimension 
n > 1, they are many possible solutions and determining 
the correct one is not necessarily possible in general. We 
need to make a strong additional hypothesis. 



4.2 Further hypothesis on the l<ernel <^t 

In the following, we suppose that 

H3 S = XI with V i, E{X\) = A G R, and can be diag- 
onalized into a matrix D with some constant unitary 
matrix U (that does not depend on z): 



(25) 



Let us point out that, though always true in dimension 
n = 1, H3 is a strong hypothesis for dimension n > 2. 
Clearly it is satisfied in the case all the components of 
the process are identically distributed, i.e. the process is 
invariant under arbitrary permutations. Other cases are 
very specific and consequently are not discussed in this 
paper. 



4.3 The estimator 

Using Eq. (HT)) along with H3, one gets that I + §* is 
diagonalizable in the same basis U and that 



i + w; = uHi- Diy^u, (26) 

and thus from (|24p. one gets 

Ez = Udi^^U^iXgi'^y), (27) 

where is the diagonal matrix with the real positive 
coefficients : 

= (28) 

So the estimation problem reduces to being able to re- 
cover the coefficients D^*^ from the coefhcients E^'^ = 1 1 ^ 
£)fefe|-2^ This problem is solved by the following Lemma. 

Lemma 1 Let k be fixed. Let z — iui (with a; G RJ. Then 



(1 _ Df^^^)-^ = ^°s|s"|-iff(^iog|BL'=|)^ (29) 
where the operator H{.) refers to the Hilhert transform 



The proof is based on the following theorem: 
Theorem 2 (Paley- Wiener [27]) Let's suppose that we 



observe the amplitude \fi^ 
real filter ft . // 



of the Fourier transform of a 



l0g(|/»c.|) 



duj < oo, 



(30) 



then the filter gt defined by its Fourier transform 

^,„=e^°g(l/."l)-^('°g(l/."l)), (31) 

is the only causal filter (i.e., supported by R^j, which is 
a phase minimal filte^ and which satisfies \'giu\ = \ fiu}\- 

Proof of the Lemma. It is a simple application of this the- 
orem with \f,^\ = y/E^ = |1 - Olf'^l^^- Indeed, let us 
first show that — (1 — )~^ is a minimal phase filter, 
i.e., that both the poles and zeros are such that ^{z) < 0. 

— Let z be a zero of (1 — D^'')^^, then it is a pole of 
Z)^'^ and consequently of (pz- However, from HI and 
H2 one concludes that \(f>z\ < J^°° |e~^*|(/)t(it cannot 
be infinite, unless K(z) < 0. 

— Let z be a pole of (1-D^'')^^ It thus satisfies D^'^ — 1. 

Thus 1 is an eigenvalue of (j)o which is in contradiction 
with H2. Thus (1 - D'^'")'^ has no pole. 

Consequently gz — {I ~ D^^)^^ is a minimal phase filter. 
Moreover, since every coefficients of (j)t are positive and in 
L^ , the Fourier transforms D^^\ for any k, are continuous 
functions of lj and goes to at infinity. Along with the fact 
that { \ — D^^)~^ has no pole and has its zeros on the half- 
plane 3?(z) < we easily conclude that \ fii^\ = |1 — D^'^l"^ 
satisfies ((30|) . The theorem above can be applied and the 
Lemma follows. 

Main steps for kernel estimation. The different steps 
for the final kernel estimator, in the fully symmetric case, 
can be summarized as follows 

— Set A small enough (see Section 4.1) and fix h = A, 

— Estimate the unconditional intensity A, 

— Estimate the auto-covariance operator v^^^ and com- 
pute its Fourier transform v^^^ , 

— Compute using Eq. ([Ml)- Diagonalize 
it and compute the matrix Eii^ defined by Eq. (j27p . 

— Compute the diagonal matrix Dii^ using (I29L 

— Go back to the initial basis and Inverse Fourier trans- 
form to get the estimation of (j)t. 



4.4 Some particular cases 

4.4.1 The one dimensional case n = 1 

As we already pointed out, the hypothesis H3 is always 
true in this case since all the functions are scalar functions. 



^ a minimal phase filter 25 is a filter whose all the zeros and 
the poles of its Laplace transform satisfy ^(z) < 
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Thus, the phase determination problem is solved without 
adding any assumption apart from HI and H2. The kernel 
estimator simply consists in first computing 

+ (32) 

and then inverse Fourier transform of 

Z^l- e-i°s|i+'^-l+'^('°g|i+^-l). (33) 



We estimate Vr from the realization of a Hawkes pro- 
cess (Xt), i > (ID or 2D). We then strictly follow the 
method described in previous section in order to estimate 
the decay kernel (j). wi'*' is sampled at rate A (i.e., t = nA) 
up to a maximum lag Tmax- As explained in Section |4.1[ 
in order to avoid problems related to the zeros of g*-''^ , we 
naturally choose h = A. From a practical point of view, 
A has to be chosen small enough in order to avoid Fourier 
aliasing. 



4.4.2 The bisymmetric 2-dimensional case n = 2 

In the two-dimensional case, the hypothesis H3 is satisfied 
in the particular case 



(34) 



and the kernel 4>t is bisymmetric, i.e., has the form 



The matrix of the Laplace transform (pz can be indeed 
decomposed as follows: 



0. = C/^("^' 





All , 2l2 



u 



where = ^ / J 

Diagonalizing and identifying the diagonal coefficients 
in both hand sides of (P7)) leads to 



~.(h)ii 



A? 



(h) 



= 11 + 1?" + 



Mh)ll _Mh)l2 



(36) 
(37) 



5.1 The case n = 1 



5.1.1 Exponential kernel 



We first simulate a one dimensional Hawkes process with 
an exponential kernel: 



ae 



-^1 



t>o 



(39) 



where we choose (/i — 1,Q! = 1,/3 = 4). This gives 
(35) 00 = 1/4^ A = 4/3. The simulated sample contains 130000 



jumps (it is approximately T ~ 10^ seconds long). In fig- 
ure [IK a) we have reported the estimated kernel function 
4>t (circles) on top of the true kernel (solid line). We see 
that the estimated kernel function is, up to some noise, 
very close to the real kernel 



5.1.2 Power Law Decay 

We now consider a power-law decaying kernel defined as: 

0t = a{t + ^)Ht>o (40) 



with /3 < —1. In this case we have (po = ~^g^7 
choose a = 32 and /3 = — 5 and 7 = 2 making 



Applying the same method used in the case ji = 1 to 
these last two equations, we get an estimate of 'I'. Finally 
we apply 

^z = {l + $,)-^$z (38) 

giving (p. Applying the inverse transform to (j) we finally 
get 4). 

In the following section we illustrate these results and 
our methods on numerical simulations of ID and 2D Hawkes 
processes. 



5 Numerical illustrations 

Let us discuss some examples illustrating the estimation 
method as defined previously using simulated Hawkes pro- 
cesses. All the simulations have been performed with the 
thinning algorithm described in [22) . 



/3+1 



We 

=0.5 < 1 

and A = 2 again with 130000 jumps (T ~ 65000 sees). The 
kernel estimated on a single sample is reported in[IJb). 
Once again, one can see that, up to an additive noise, the 
estimated kernel fits well the real one. Let us point out 
that, since the decay is much slower than in the previous 
exponential case, we chose the maximum lag Tmax to be 
ten times as much. 



5.1.3 Error Analysis 

Let us briefly discuss some issues related to the errors asso- 
ciated with our kernel estimates. In ref. a central limit 
theorem has been proved that shows that, asymptotically, 
the errors of the empirical covariance function estimates 
are normally distributed with a variance that decreases as 
(or A''"-'^ for A fixed). One thus expects the same kind 
of results in the estimates of the components of 0. 
Let us define the estimation error as: 



E 

fe=i 



(41) 
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Fig. 1. Non parametric estimation of a one dimensional 
Hawkes process using method described in Section r4.4.1l from a 
unique realization with 130000 jumps. Estimated (o) and ana- 
lytical kernel (solid line) are shown, (a) Case of the exponential 
decay kernel (Eq. ^) with a = 1, ;3 = 4. We used A = 0.01 
and Tinax ~ 2. (b) Case of the power-law decay kernel (Eq. 
((ggi)), with a = 32, /? = -5 and 7 = 2. We used A = 0.05 

and Tmax = 20. 



where (p'''^'' is the estimated kernel. In Fig. [2fa), we have 
reported as a function of the sample length T for the 
same realization as the one used in Fig. [Ija) . As expected, 
one observes a behavior very close to T~^. Let us now look 
at the error between the analytical 4> and the estimated 
one for a fixed t > 0. We find that, for each t, the series 
of errors are centered gaussian and mainly uncorrelated. 
Their variance increases as t decreases to 0. The normality 
of the observed errors is illustrated in Fig. ^h) where 
we report, for 3 different values of t, the qq-plots of the 
empirical error pdf (with standardized variance) versus 
the normal pdf. 

Let us point out that the estimation error depends, 
a priori, on the sampling rate A. Numerical simulations 
show that there is an optimal choice for the sampling rate 
parameter A. Indeed, we found that the estimation error 
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Fig. 2. Error analysis of a one dimensional Hawkes process 
with the same exponential kernel as in Fig. [TJa) with A = 0.01 
and Tmax=2- (a) Mean square error as a function of the sam- 
ple length T in log-log coordinates. The solid-line corresponds 
to the curve the expected behavior, (b) qq-plots of the 
Normal probability distribution function (pdf) versus empiri- 
cal error pdf for (j)t for three different values of t. Here T — 10"" 
seconds is fixed. The empirical error pdf 's have been normal- 
ized to have the same variance (the variance increases as t goes 
to 0). 



is minimum for a finite value A — A* which is neither 
large nor close to 0. For instance, for the process used in 
Fig. [Ija), we found A* ~ 0.15. The fact that there exists 
such a minimum is not that surprising. On the one hand, 
if A is too large the error is dominated by Fourier aliasing. 
On the other hand, if A is too small, estimating Wt"^-* for 
a fixed T > corresponds to estimating the correlation 
between the increments at scale A of two point processes. 
Such an estimation is well known to converge to when 
the time-scale A goes to 0. In the Finance literature (see, 
e.g., [2]), this is known as the Epps effect. Of course, the 
optimal value A* is not known a priori. However, it is 
natural to think that it is inversely proportional to A. On 
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practical situations we advocate to start choosing A of the 
order of 0.1 A and then play around this value. 

5.2 The bisymmetric 2-dimensional case n = 2 with an 
exponential kernel 

Let us now consider a 2-diniensional bisymmetric Hawkes 
process introduced in Section 14.4.21 Both the diagonal 
term 0*^'^) and the anti-diagonal term (t)^"") (cf Eq. ([55)) ) 
have an exponential form 

^ = 0f ^ a,e-^^H,>, (42) 

= 012 = 021 ^ «^e-^»*i,>o. 

We use the following parameters for the simulations, ad = 
0.5, (3d = 8, Qfa = 1, /3a = 4, and fu,^ = fj,"^ = 1 (conse- 
quently A = 1.45). We simulate about 60000 jumps for 
each of the two components of the Hawkes process. The 
Figure [3] below shows the theoretical (solid lines) and esti- 
mated (o)versions of (j)d (in Fig.|3Ua) and 0a in Fig.[31[b)). 
We see that, as in the ID case, we get a reliable estimate 
of both kernels. 



[2], consists in describing high frequency price dynamics 
as the difference between two coupled Hawkes processes 
representing respectively up and down discrete price vari- 
ations. The authors emphasized that such model allows 
one to account for the main stylized facts characterizing 
the observed noise microstructure, namely the signature 
plot and the Epps effect (see end of Section 16.2.11 for a 
short remark on these effects). 

We use level 1 data (i.e, trades and best limit data) 
provided by QuantHouse Trading Solutionf0 of the 10- 
years Euro-Bund (Bund) and on the futures contracts on 
the Dax index. In this paper, we use 75 days covering 
the period between 2009-06-01 and 2009-09-15. The most 
liquid maturity is always chosen. In order to minimize sea- 
sonal effect, every day, only the data between 9 AM and 
11 AM (GMT) are kept during which the rate of incom- 
ing orders can be considered as stationary, moreover, we 
ignored the days with too little trades. The data has mil- 
lisecond accuracy and it has been treated in such a way 
that each market order is equivalent to exactly one trade 
0. 




Fig. 3. Non parametric estimation of the two dimensional 
Hawkes bisymmetric exponential kernel (Eq. (021)) from a 
unique realization with 60000 jumps. We chose ad ~ 0.5, 
Pd = 8, aa = 1, I3a = 4, and fi^ ^ fi^ = I {X ^ 1.4545) 
We used A = 0.05 and Tmax ~ 2. Estimated (o) and analytical 
kernel (solid line) are shown, (a) Estimation of the diagonal 



term 4't'^\ (b) Estimation of the anti-diagonal term 



6 Application to high frequency market data 



6.1 Estimation in the case of the one-dimensional 
model for Bund data 



In the following we test our method on the process of in- 
coming trade times (market orders) for the Bund Future. 
This is a 1-dimensional point process, consequently we 
use the estimator described in Section 14.4.11 Every single 
day, we compute an estimation of the function v^J^ with 
h = A ^ 0.1 and nA < 100. These quantities are then av- 
eraged on all the 75 days and we perform the estimation 
using this averaged quantity. 

The results are shown in the figure U) The so-obtained 
estimation of clearly displays a power law decay (Fig. 
mb)). A fit with the function at'^ gives a ~ 0.1 , and 

Let us point out that we studied the stability of this 
estimation as the value for A is changed. Table [1] clearly 
shows that, as A decreases, there is a pretty large dis- 
crepancy on the estimated values of both both a and l3. 
Though these values seem to stabilize when A reaches 0.1 
which could indicate that this choice is not far from the 
optimal value A* (see Section (57131) 



As mentioned in the introduction, Hawkes point processes 
have found many applications and notably as models for 
high frequency financial data. Indeed, because of the dis- 
crete and correlated nature of trade and limit order arrival 
times, point processes are natural models of market dy- 
namics at the microstructure level. One can mention for 
instance Ref. [16] where buy and sell trades arrivals are 
represented by a bivariate Hawkes process with exponen- 
tial kernels (see also dZ]). Another approach, developed in 



^ http://www.quanthouse.com/ 

^ When one market order hits several limit orders it results 
in several trades being reported. 

* Strictly speaking a decay kernel of the form at~^ is not 
admissible for our model. Indeed, its integral diverges both at 
f = and at infinity. So it is admissible as long as we consider 
that this behavior has two cut-off, for small and large t. This 
hypothesis will be implicitly made in the following. 
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Fig. 4. 1-dimensionaI non parametric estimation of (j) for the 
point process of incoming market orders of the Bund futures. 
We used A = h = Q.l and Tmax = 100. (a) The estimation 
clearly displays a slow decay, (b) Log-Log plot the figure (a) 
above reveals that, in good approximation, the kernel can be 
considered as decaying as . A power-law least-square fit 
(solid line) provides the exponent /? = —1.05. 

6.2 Estimation in the case of the two-dimensional 
model 

6.2.1 The two dimensional model 

In this Section, we apply our estimation framework to the 
process that was initially introduced in [2j for modeling 
the changes in the mid-price of a given asset. Let Xt be the 
mid-price, we decompose it as the sum of the cumulative 
positive jumps and the cumulative negative jumpf0 

^ strictly speaking, the jumps have not always the same am- 
plitude. Though for the Bund, most of them are 1-tick large 
it is not the case for the DAX data. If it is not the case Nl^ 
(resp. N^) just represent the point process (with jumps al- 
ways equal to 1) with the same arrival time as the upward 
(resp. downward) jumps of Xt 



A a 13 



1 


0.146276 


-1.41515 


0.5 


0.117503 


-1.30292 


0.1 


0.098624 


-1.05329 


0.05 


0.092975 


-1.03596 


0.01 


0.089227 


-0.99899 



Table 1. Results of the Power Law fit at^ , applied to the 
estimated Hawkes kernel for the rate of incoming market orders 
of the Bund Futures. We show the results for different values of 
the parameter A. As A decreases, the estimations for both a 
and P stabilize and seem to indicate that the choice of Z\ = 0.1 
is not far from the optimal value A* . 

Xt = N+~N,-. 
We look at the two dimensional point process 




As advocated in Ref. [2], 2 dimensional Hawkes processes 
are suited to model the so-defined Nt. A simple bisym- 
metric exponential kernel (with a null diagonal term) is 
able to reproduce remarkably the so-called signature plot. 
Moreover, in the case of 2 assets, each of them modeled by 
a 2-dimensional Hawkes and their interaction modeled by 
a simple symmetric cross term (leading to a 4-dimensional 
Hawkes process), these models have also been able to re- 
produce the so-called Epps effect. 

The Signature plot and the Epps effect are two of the 
main stylized facts of financial time-series at a microstruc- 
ture level. It is not the goal of this paper to go into more 
details about them, however, it is interesting to point out 
that the quantity Vt'^'^ on which our estimation procedure 
is based includes both the Signature plots (basically cor- 
responding to the diagonal terms of the h — >■ Wq'*' matrix 
function) and the Epps effect (basically corresponding to 
the non diagonal terms of the same function). 

6.2.2 Validating hypothesis H3 

In order to apply our estimation framework in dimension 
2, we first need to check the hypothesis H3. This hypoth- 
esis is two-folded. 

— First of all, on Fig. \E\ we show for each single day 
an estimation of — versus = A~ . The plot 
shows that yli ~ A2 with a very small variation and 
therefore that we are within our assumption that S = 
AI with \ = A^= A^. 

— Secondly, we need to show that the kernel matrix di- 
agonalizes in a basis that is constant. Actually, Fig. 
ini shows that (and consequently the kernel) is, in 
good approximation bisymmetric which implies that 
it diagonalizes on a constant basis. Indeed we see that 

pjg_ Igjja)) is very close to v^^'^'^ (A, in 
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Fig. Eta)) for all r > 0. In the same way, we see that 

v^''^'^^ (o, in Fig. [nib)) is also (up to some estimation 

noise) equal to u^'^'^^ (A, in Fig. Mjo)) for all r > 
(same results are obtained for r < 0). 

Let us point out that the choice of the midpoint price 
series is motivated by two factors. First, if we choose the 
series of traded prices, we have, because of a non-zero 
spread value, an important "spurious" bouncing effect (os- 
cillation between best bid and best ask prices) that is very 
hard to capture by our modeling approach (see [2]) and 
we get negative decay functions on the diagonal of (fi: cf)^^ 
and (/)^^ (see Section 16.2.31 for a longer discussion on the 
influence of the bouncing effect on the kernel estimation). 
Second, if we choose the series of the last traded prices 
of buy orders only as in [2] , the bouncing artifact disap- 
pears, however the symmetry of A'^ and A~ is naturally 
no longer verified. The series of midpoint prices has the 
advantage of having a reduced bouncing effect and ofpre- 
senting identically distributed 7V+ and processefl 




0.02 



0.02 0.03 0.04 



0.05 0.06 

A+ 



0.07 0.08 0.09 



Fig. 5. Estimated = A+ versus A^ = A- for every single 
day (75 total) in the 2-dimensional model for mid-price of the 
Bund futures. Each dot represents a single day. The solid line 
corresponds to A~^ = A~ . We see that, in good approximation 
A~^ ~ A~ validating the first part of hypothesis H3. 



6.2.3 Kernel estimation 



We use the framework developed in Section 14.4.21 in order 
to estimate 6. 



Still, when actually going through the estimation process, 
for stability reasons, we chose A to be the average of the esti- 
mated A'^ and A~ and, similarly, u^''''^^ and u^-''''^^ have been 
averaged as well as vi'^^'^^ and vi''^'^^ 




Fig. 6. Estimation of for r > on 75 days (from 9am to 
11am) in the framework of the 2-dimensional model for mid- 
price of the Bund futures. We see that the matrix ni''^ is, in 
good approximation, bisymmetric. Same results would be ob- 
tained for r < 0. That completes (with Fig. [5]) the validation 
of hypothesis H3. (a) (o) and vi''^'''^ (A), (b) vi''^'^^ (o) 

(A). 



and vi'^^'^^ 



The result of the non parametric estimation is shown 
in figure [T] We immediately notice that the diagonal term 
= (^11 is an order of magnitude smaller than the non 
diagonal one <^*^°^ = and can be considered as being 
zero. This means that and N~ are not self exciting 
and exclusively mutually exciting. The log-log plot in Fig. 
[8] of the anti-diagonal term 0'"' — displays a power- 
law behavior at^ with a ~ 0.095 and /3 ~ —0.99 which 
is unsurprisingly close to the values we found for the 1- 
dimensional model in Section [6.11 

Figure [HI show the estimation of the bisymmetric ker- 
nel on Dax Futures time-series. Here, the diagonal kernel 
is of the same order (it can even be greater) than the 
anti-diagonal term. This property, that was not observed 
for the Bund, can be interpreted using tick size considera- 
tions. Indeed, the tick size of Dax Futures is well known to 
be very small in the sense that the agents trading on this 
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Fig. 7. Non parametric estimation of the bisymmetric kernel (j) 
for the mid-price of the Bund futures using 75 days (between 
9am to 11am). We used Z\ = /i = 0.5 and r^ax = 200. The 
diagonal term — appears to be negligible compared to 
the anti-diagonal term = confirming that the A'^+ and 
N- are mutually but not self exciting (a) Estimation of the 



diagonal term 



(b) Estimation of the anti-diagonal term 
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Fig. 8. Log-Log plot of the anti-diagonal kernel <^'^"' = (f)^^ 
(estimated for the Bund futures ) shown on Fig-Ob). It shows 
that, in good approximation, this term can be considered as 
decaying as A power law fit (solid line) gives the exponent 
/3~ -L 



asset only care for moves that are greater than 1 tick (or 
equivalently for successive moves of 1 tick in the same di- 
rection). This translates into the fact that, at the scale of 
1-tick, there is hardly no bouncing effect (negative auto- 
correlation of the returns) . On the contrary, on the Bund, 
the tick size is well known to be "too big", i.e., when the 
price goes up by 1-tick, most of the time the next move is a 
downward move, i.e., the bouncing effect is very strong. In 
the framework of our model, it is clear that the strongest 
the bouncing effect the smaller the anti-diagonal term. 
Actually, looking carefully at the Fig. [Tlja), the bouncing 



effect on the Bund is so strong that, though much smaller 
than the anti-diagonal kernel, the estimated diagonal ker- 
nel seems to become significantly negative (compared to 
the estimation noise) which defies the assumption of our 
model. Indeed, if 4>^^ becomes negative, there is a priori, in 
theory, no guarantee that At remains positive, and a nega- 
tive rate of arrival is unacceptable. Of course, in practice, 
since these negative values are much smaller than the anti- 
diagonal kernel, the probabilities for At to become negative 
are clearly negligible. The bouncing effect is most appar- 
ent on the series of trade prices (bouncing between best 
ask and best bid prices) and while it is heavily dampened 
when we use the series of midpoint prices, we see that it 
is still too strong to be fully captured by our model (see 
the discussions in and in ^). 



(a) 



(b) 



Fig. 9. Non parametric estimation of the bisymmetric kernel 
4> for the mid-price of the Dax futures using 75 days (between 
9am to 11am). We used A = h = 0.5 and Tmax = 200. Contrar- 
ily to the Bund kernel (Fig. [7|), the diagonal term 0*'*' = <^^^ 
appears to be of the same order as the anti-diagonal term 
0'"' = (f)^^ . This is due to the tick size of the Dax which is 
much smaller than the tick size of the Bund, (a) Estimation 
of the diagonal term (j)f^ j3 ~ —1.2 . (b) Estimation of the 



anti-diagonal term /3 



-0.8. 



7 Conclusion and prospects 

In this paper we have introduced a non parametric method 
to estimate the shape of the self-exciting kernels for sym- 
metric Hawkes processes. Our method can be implemented 
very easily since it relies on the computation of the empiri- 
cal covariance matrix and mainly uses Fourier transforms. 
As illustrated on specific numerical examples (ID and 2D), 
it provides reliable results for series of about 10^ events 
long. This method can be very helpful to get a precise 
idea of the kernel functional shape before proceeding to 
a classical parametric (e.g. maximum likelihood) estima- 
tion. In a future work, we will consider a natural extension 
of the method in order to account for random marks as- 
sociated with each events as e.g. in the ETAS model for 
earthquakes p5[|T5] . 

As far as financial time series are concerned, we have 
shown that, for both Bund and Dax Futures high fre- 
quency data, the Hawkes kernels are slowly (power law) 
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decaying. Even if this observation has to be confirmed 
by further studies, it is noteworthy that the values of 
the kernel power-law exponent we found (/3 ~ —1) cor- 
responds to the onset of stationarity. Our findings can be 
of great interest since they allow one to finely describe, at 
high frequency, the market activity as a long-memory self- 
exciting process. This can notably allow one to improve 
the results of Ref. [2^ where the authors restrict themselves 
to exponential kernels to reproduce noise microstructure 
main features using Hawkes processes. On a more general 
ground, one can hope that our findings will be helpful to 
bridge the gap between a theory of price variations at the 
microstructure level and the standard coarse scale models 
with long range correlated volatility. 
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